Per-Group ECMP for Multidestination Traffic in DCE/TRILL Networks

ABSTRACT

Consistent with embodiments of the present disclosure, systems and methods are disclosed for providing per-group ECMP for multidestination traffic in a DCE/TRILL network. Embodiments enable per-group load balancing of multidestination traffic in DCE/L2MP networks by creating a new IS-IS PDU to convey the affinity of the parent node for a given multicast group. For broadcast and unknown unicast flooded traffic, the load balancing may be done on a per-vlan basis.

BACKGROUND

Fabricpath/Transparent Interconnection of Lots of Links (“FP/TRILL”) isa multipathing solution, amongst other benefits, provided in a layer 2network. The multipathing solution is provided to unicast by use ofEqual-Cost Multi-Path Routing (“ECMP”). For unknown unicast, broadcastand multicast traffic (henceforth referred to as multidestinationtraffic), the mechanism to provide multipathing is by using multipletrees, with each tree rooted at a different switch. The use of multipletrees may be expensive to maintain both in terms of software andhardware resources. Therefore, there exists a need to obtain graphsconstructed for unicast traffic and use them for multidestinationtraffic also. This not only provides a way of using ECMP formultidestination traffic, but also uses fewer resources and unifiedcontrolplane constructs for unicast and multidestination traffic.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this disclosure, illustrate various embodiments. In thedrawings:

FIG. 1 is a block diagram of a network device operable according toembodiments of this disclosure;

FIG. 2 is a flow chart of a method according to embodiments of thisdisclosure;

FIG. 3 is a flow chart of a method according to embodiments of thisdisclosure;

FIG. 4 is a flow chart of a method according to embodiments of thisdisclosure.

DESCRIPTION OF EXAMPLE EMBODIMENTS Overview

Consistent with embodiments of the present disclosure, systems andmethods are disclosed for providing per-group ECMP for multidestinationtraffic in a FP/TRILL network. Embodiments enable per-group loadbalancing of multidestination traffic in FP/TRILL networks by creating anew IS-IS PDU to convey the affinity of the parent node for a givenmulticast group. For broadcast and unknown unicast flooded traffic, theload balancing may be done on a per-vlan basis.

It is to be understood that both the foregoing general description andthe following detailed description are examples and explanatory only,and should not be considered to restrict the application's scope, asdescribed and claimed. Further, features and/or variations may beprovided in addition to those set forth herein. For example, embodimentsof the present disclosure may be directed to various featurecombinations and sub-combinations described in the detailed description.

DETAILED DESCRIPTION

The following detailed description refers to the accompanying drawings.Wherever possible, the same reference numbers are used in the drawingsand the following description to refer to the same or similar elements.While embodiments of this disclosure may be described, modifications,adaptations, and other implementations are possible. For example,substitutions, additions, or modifications may be made to the elementsillustrated in the drawings, and the methods described herein may bemodified by substituting, reordering, or adding stages to the disclosedmethods. Accordingly, the following detailed description does not limitthe disclosure. Instead, the proper scope of the disclosure is definedby the appended claims.

FP/TRILL may provide Layer 2 multicast multipathing by creating aplurality of trees for multidestination packets. This may provide formultipathing on a per-flow basis, but can require extra hardware andsoftware resources. Classical Layer 2 Ethernet (using Per-Vlan RapidSpanning Tree Protocol (“PVRSTP”) for instance) can construct a spanningtree for each vlan and provide multiple paths, but only on a per-vlangranularity. For Layer 3 multicast, there have been solutions thatprovide multicast multipathing on a per-group basis for L3 multicastusing Protocol Independent Mulitcast (“PIM”). These solutions require atree, constructed by PIM protocol packets, for multicast but allows forECMP towards PIM RP or Source Router. However, none of these priorapproaches obtain graphs constructed for unicast traffic and use themfor multidestination traffic also. Described embodiments herein usefewer resources and unified controlplane constructs for unicast andmultidestination traffic.

It should be understood that FP/TRILL may use an extension ofIntermediate System to Intermediate System (“IS-IS”) as its routingprotocol. In described embodiments, it may be necessary to propagate aspecial type of Label Switched Path (“LSP”) also called Group-MembershipLSPs (“GM-LSP”). A GM-LSP may be an LSP that conveys the per-VLAN Layer2 multicast address derived from IPv4 IGMP or IPv6 MLD notificationmessages received from attached nodes in the vlan, indicating thelocation of listeners for these multicast addresses. Since the LSPs areflooded by reliable flooding, all the FP/TRILL nodes in the network haveinformation on where different multicast group listeners are located inthe network.

Unlike Layer-3 PIM multicast, the trees constructed for multipathing inFP/TRILL may not be driven by any control plane/data packets. InFP/TRILL, a certain number of FP/TRILL switches are selected as rootsand trees are created using those nodes (switches) as roots. TheFP/TRILL dataplane packets may have extra encapsulation that identifiesboth the source of the multicast as well as the chosen tree on which themultidestination packet must traverse.

Embodiments described herein, instead of using trees for multicast, mayuse ECMPs. The ECMPs constructed for unicast traffic forwarding may alsobe used for multidestination traffic forwarding. With ECMPs a switch canhave multiple parent switches for a given source switch (compared totrees where there is just one parent). It should be ensured that a childswitch receives at most one copy of the frame. Each switch should haveonly one parent, for a given source switch, which may forward a copy fora given group.

To accomplish this, embodiments herein propose a new PDU extension toIS-IS similar to extensions proposed in “Extensions to IS-IS for Layer-2Systems” (http://tools.iettorg/html/draft-ietf-isis-layer2-03). Inaddition to GM-LSP flooding, a switch with interested multicastreceivers for a multicast group indicates to each of the parent switchesfor a given source switch via the PDU (also referred to as Group ParentSelect PDU or GPS-PDU), which one of its multiple parents (when multipleparents exist for a switch) should send traffic to it for a givenmulticast group address.

In the forwarding plane, there will no longer be a requirement to usethe tree identifier in the data packet (FTAG in FP for ingress rbridgeidin TRILL), as there are no longer different trees for multidestinationmultipathing. In some embodiments, a special tree identifier can be usedto indicate that these data packets are using the enhanced protocol tofacilitate interoperability between embodiments and the present schemeused in FP/TRILL. Nicknames from 0xFFC0 through 0xFFFF and 0x0000 arereserved nicknames in TRILL. One of these may be reserved for thespecial identifier).

The outgoing interface list computed at an intermediate parent for agiven multicast group may simply include the ECMP path that was signaledby the PDU. In the forwarding plane, the Incoming Interface Check (IIC)in FP or the Reverse Path Forwarding Check (RPF Check) in TRILL, ismodified such that the check is performed on a per-multicast Groupbasis/per-Source Switch basis instead of a per-Source Switch (orper-ingress RBridge Switch in TRILL) basis as done in current FP/TRILLimplementations.

The GPS-PDU may be sent between a switch and its parent switch in a ECMPgraph, for each switch in the network. First, each switch in the networkmay choose a parent switch to accept the traffic for a given group.Next, the parent choice for a group is centralized at each switch.

FIG. 1 is a block diagram of a system including network device 100.Embodiments of per-group ECMP for multidestination traffic in FP/TRILLnetworks may be implemented in one or more network devices, such asnetwork device 100 of FIG. 1. In embodiments, network device 100 may bea network switch, network router, or other suitable network device. Anysuitable combination of hardware, software, or firmware may be used toimplement embodiments of per-group ECMP for multidestination traffic inFP/TRILL networks. For example, embodiments of per-group ECMP formultidestination traffic in FP/TRILL networks may be implemented withnetwork device 100 or any of other network devices 118. Theaforementioned system, device, and processors are examples and othersystems, devices, and processors may comprise the aforementioned memorystorage and processing unit, consistent with embodiments of per-groupECMP for multidestination traffic in FP/TRILL networks.

With reference to FIG. 1, a system consistent with embodiments ofper-group ECMP for multidestination traffic in FP/TRILL networks mayinclude a network device, such as network device 100. In a basicconfiguration, network device 100 may include at least one processingunit 102 and a system memory 104. Depending on the configuration andtype of network device, system memory 104 may comprise, but is notlimited to, volatile (e.g., random access memory (RAM)), non-volatile(e.g., read-only memory (ROM)), flash memory, or any combination. Systemmemory 104 may include operating system 105, one or more programmingmodules 106, and may include program data 107. Operating system 105, forexample, may be suitable for controlling network device 100's operation.Furthermore, embodiments of per-group ECMP for multidestination trafficin FP/TRILL networks may be practiced in conjunction with a graphicslibrary, other operating systems, or any other application program andis not limited to any particular application or system. This basicconfiguration is illustrated in FIG. 1 by those components within adashed line 108.

Network device 100 may have additional features or functionality. Forexample, network device 100 may also include additional data storagedevices (removable and/or non-removable) such as, for example, magneticdisks, optical disks, or tape. Such additional storage is illustrated inFIG. 1 by a removable storage 109 and a non-removable storage 110.Computer storage media may include volatile and nonvolatile, removableand non-removable media implemented in any method or technology forstorage of information, such as computer readable instructions, datastructures, program modules, or other data. System memory 104, removablestorage 109, and non-removable storage 110 are all computer storagemedia examples (i.e., memory storage.) Computer storage media mayinclude, but is not limited to, RAM, ROM, electrically erasableread-only memory (EEPROM), flash memory or other memory technology,CD-ROM, digital versatile disks (DVD) or other optical storage, magneticcassettes, magnetic tape, magnetic disk storage or other magneticstorage devices, or any other medium which can be used to storeinformation and which can be accessed by network device 100. Any suchcomputer storage media may be part of device 100. Network device 100 mayalso have input device(s) 112 such as a keyboard, a mouse, a pen, asound input device, a touch input device, etc. Output device(s) 114 suchas a display, speakers, a printer, etc. may also be included. Theaforementioned devices are examples and others may be used.

Network device 100 may also contain a communication connection 116 thatmay allow network device 100 to communicate with other network devices118, such as over a network in a distributed network environment, forexample, an intranet or the Internet. Communication connection 116 isone example of communication media. Communication media may typically beembodied by computer readable instructions, data structures, programmodules, or other data in a modulated data signal, such as a carrierwave or other transport mechanism, and includes any information deliverymedia. The term “modulated data signal” may describe a signal that hasone or more characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media may include wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, radiofrequency (RF), infrared, and other wireless media. The term computerreadable media as used herein may include both storage media andcommunication media.

As stated above, a number of program modules and data files may bestored in system memory 104, including operating system 105. Whileexecuting on processing unit 102, programming modules 106 may performprocesses including, for example, one or more of method 200, 300, or400′s stages as described below.

FIG. 2 is a flow chart illustrating the steps to support multipathingacross multicast groups according to embodiments described herein. Themethod may begin at step 200 where the unicast ECMP graph is initiallyobtained. The method may then proceed to step 210 where for each switchin the network, the available paths and the available parents for thatswitch are identified by using the unicast ECMP graph.

The method may next proceed to step 220 where as group membershipinformation is realized (using IGMP or MLD), local policy informationmay be used to inform the parents of the chosen parent of this group.The method may then advance to step 230 where the group membership infois flooded via GM-LSP as required by FP/TRILL.

Next, at step 240 the parents and child may enforce the selection madeusing forwarding constructs. Finally, at step 250, if there is a changein the ECMP graph, needed adjustments may be made after re-obtaining theECMP graph.

In some embodiments, there could be some multidestination groups such asBroadcast or floods due to an unknown L2 unicast packet. Thesemultidestination packets share the same group address and so they wouldnot benefit from multipathing on a per-group basis. In these scenarios,the parent may be chosen based on VLAN for these special groups. Thiswould be over and above the selection of parent based on multicastgroups for the same vlan. Thus, multipathing for multicast groups for agiven vlan and multipathing for constant group multidestination groupsacross different vlans may still be achieved.

Current FP/TRILL capable hardware may support embodiments ofmulti-pathing across groups without performing IIC/RPF check. There willbe a small but significant asic change needed to perform the modifiedIIC/PRF checks using multicast group address. The involves checking the(multicast group, source switch ID) information against the allowed listof incoming interfaces. This is a version of Reverse Path Forwarding(RPF) check, and is meant to avoid loops and duplicates.

It should be noted that the current implementation of DCE/TRILL mayprovide multidestination multipathing at a flow-level (at the extra costof more trees, all paths not necessarily being used, etc.). The presentdisclosure simplifies hardware resources, and allows formultidestination multipathing on a per-group level by enabling per-grouploadbalancing of multidestination traffic in DCE/L2MP networks by usinga new ISIS PDU to convey the affinity of the parent node for a givenmulticast group. For broadcast and unknown unicast flooded traffic,loadbalancing may be done on a per-vlan basis.

Enabling the use of unicast ECMP graph for multidestination traffic mayeliminate the software and hardware complexity of maintaining multipletrees for the loadbalancing of multidestination traffic. This also hasadded benefit of faster convergence when there is a change in networktopology.

In some embodiments, the receiver of the multicast traffic may need toensure that it accepts traffic only on interfaces its control plane at agiven time that indicates the traffic should be accepted. In theseembodiments, an RPF check is required per-group, per-source-switch. Itshould be understood that in most embodiments, the RPF checks can beconsidered optional.

For example, say there are 100 switches in a network and for multicastmultipathing there are 5 different trees. In this example, there may be100 multicast groups. The RPF check table in current implementations ofDCE/TRILL would then require 500 entries in each of the switches.However the number of RPF check entries needed in embodiments describedherein is quite different. At a given switch, if for the given sourceswitch there is only one parent, then only one entry in the RPF checktable is needed.

If the table allows for masking of some lookup fields, then it would bepossible to mask the group address and use only one entry. So for casessuch as these, the described embodiments result in better tableutilization. For situations where there are more than one parent switchfor a given source switch, the number of RPF entries needed may be asgreat as the number of multicast groups. However in the majority oftopologies, there will be only small group of switches that will havemultiple parents as seen from a given switch where the RPF entries areinstalled and so the RPF entries would scale reasonably.

FIG. 3 is a flow chart illustrating embodiments of the presentdisclosure. Method 300 may begin at step 310 where a unicast ECMP graphmay be obtained. Once the unicast ECMP graph is obtained, method 300 mayproceed to step 320. At step 320 available paths for a plurality ofnetwork devices may be identified from the unicast ECMP graph. Forexample, there may be a plurality of paths over which data can travelfrom a first network device to a second network device.

Once available network paths have been determined, method 300 mayproceed to step 330. At step 330, the parents for each of the networkdevices may be identified. Similarly, at step 340, group membershipinformation for each of the identified network devices may be obtained.In some embodiments, the group membership information may be obtainedusing local policy information via IGMP or MLD protocols. Single parentswitches may be designated on a per-group basis.

Next, at step 350, identified parents of chosen group parent informationderived from the group membership information are informed of theirdesignated status. In some embodiments, sending a GPS-PDU between eachof the plurality of network devices and their respective associatedparent network devices. The GPS-PDU may indicate which of multipleparents existing for a network device should receive traffic from thesource switch for a given multicast group address. Following this,method 300 may proceed to step 360. At step 360, the chosen group parentinformation may be flooded to the other network devices via groupmembership LSP (“GM-LSP”).

Once the parent information has been distributed, method 300 may proceedto step 360 and enforce selection of group information throughforwarding constructs. In some embodiments, step 360 may includeselecting an associated parent network device to accept traffic for adesignated group. In some embodiments, enforcement may includedynamically adjusting chosen group parent information upon notificationof a change in the unicast ECMP graph. Furthermore, in some embodiments,the address associated with the designated group may be masked using anynumber of know methods.

FIG. 4 is a flow chart illustrating embodiments of the presentdisclosure. Method 400 may start at step 410 where a PDU extension isestablished as an extension to the IS-IS protocol. In some embodiments,an identifier may be established to alert switching devices that anenhanced protocol is employed. This identifier may be stored in spacereserved for TRILL nicknames. Once the PDU extension has beenestablished, method 400 may proceed to step 420. At step 420, thenetwork may flood information associated with the first PDU extension toa plurality of switches in a network, wherein each of the plurality ofswitches has interested multicast receivers for a first group.

Here, in some embodiments, an outgoing interface list may be created atan intermediate parent switch. The outgoing interface list may onlyinclude the ECMP path signaled by the PDU. The list information may beemployed to further modify an incoming interface check or a reverse pathforwarding check resulting in the check being done on a per-multicastgroup/per-source switch basis.

Method 400 may then advance to step 430 where it may be indicating viaPDU information which of a plurality of parent switches should sendtraffic associated with a given multicast address.

Embodiments of the present disclosure include many advantages to priorart systems, including aiding in multicast multi-pathing across groupsin same VLAN. Embodiments further provide important multicast featureslike resiliency, faster convergence and redundancy. Also, embodimentsdescribed herein do not require software to build and maintain multipletrees and allows for multicast forwarding to be derived from unicastforwarding information.

Unlike current DCE/TRILL implementations there is no requirement forembodiments of the present disclosure to compute multiple trees rootedat different switches for multidestination multipathing. If theunderlying topology changes then, the link state protocol reconverges atthe changed topology. If a given node determines that the list ofparents for a given source switch has changed, it sends out GPS-PDU tocommunicate the new mapping between the multicast groups and parentinterface.

Embodiments of the present disclosure, for example, are described abovewith reference to block diagrams and/or operational illustrations ofmethods, systems, and computer program products according to embodimentsof this disclosure. The functions/acts noted in the blocks may occur outof the order as shown in any flowchart. For example, two blocks shown insuccession may in fact be executed substantially concurrently or theblocks may sometimes be executed in the reverse order, depending uponthe functionality/acts involved.

While certain embodiments of the disclosure have been described, otherembodiments may exist. Furthermore, although embodiments of the presentdisclosure have been described as being associated with data stored inmemory and other storage mediums, data can also be stored on or readfrom other types of computer-readable media, such as secondary storagedevices, like hard disks, floppy disks, or a CD-ROM, a carrier wave fromthe Internet, or other forms of RAM or ROM. Further, the disclosedmethods' stages may be modified in any manner, including by reorderingstages and/or inserting or deleting stages, without departing from thedisclosure.

All rights including copyrights in the code included herein are vestedin and are the property of the Applicant. The Applicant retains andreserves all rights in the code included herein, and grants permissionto reproduce the material only in connection with reproduction of thegranted patent and for no other purpose.

While the specification includes examples, the disclosure's scope isindicated by the following claims. Furthermore, while the specificationhas been described in language specific to structural features and/ormethodological acts, the claims are not limited to the features or actsdescribed above. Rather, the specific features and acts described aboveare disclosed as examples for embodiments of the disclosure.

What is claimed is:
 1. A method comprising: obtaining a unicast ECMPgraph; identifying available paths for a plurality of network devicesfrom the unicast ECMP graph; identifying parents for each of theidentified network devices; obtaining group membership information foreach of the identified network devices; informing the identified parentsof chosen group parent information derived from the group membershipinformation; flooding chosen group parent information via groupmembership LSP (“GM-LSP”); and enforcing selection of group informationthrough forwarding constructs.
 2. The method of claim 1, furthercomprising obtaining group membership information using local policyinformation via one of: IGMP or MLD.
 3. The method of claim 1, furthercomprising dynamically adjusting chosen group parent information uponnotification of a change in the unicast ECMP graph.
 4. The method ofclaim 1, further comprising sending a GPS-PDU between each of theplurality of network devices and their respective associated parentnetwork devices.
 5. The method of claim 4, further comprising each ofthe plurality of network devices selecting an associated parent networkdevice to accept traffic for a designated group.
 6. The method of claim5, further comprising masking the address associated with the designatedgroup.
 7. The method of claim 1, further comprising ensuring that eachof the plurality of network devices has only one associated parentnetwork device for a given source switch.
 8. The method of claim 7,further comprising forwarding the only one associated parent networkdevice information for a given source switch.
 9. A network devicecomprising: a processor configured to: obtain a unicast ECMP graph;perform load balancing based on a hash packet associated with theunicast ECMP graph; select a unicast path based on load balancing; anddetermine per-group based on the unicast ECMP graph which of a pluralityof parent switches can send traffic directed to each group such that asingle parent switch exists for each source switch.
 10. The networkdevice of claim 9, wherein the processor is further configured to add anoutgoing interface to a tree associated with the determined parentswitch.
 11. The network device of claim 10, wherein the processor isfurther configured to add a PDU extension to an IS-IS protocol.
 12. Amethod comprising: establishing a first PDU extension; floodinginformation associated with the first PDU extension to a plurality ofswitches in a network, wherein each of the plurality of switches hasinterested multicast receivers for a first group; and indicating via PDUinformation which of a plurality of parent switches should send trafficassociated with a given multicast address.
 13. The method of claim 12,further comprising establishing an identifier that alerts switchingdevices that an enhanced protocol is employed.
 14. The method of claim13, further comprising storing the identifier in space reserved forTRILL nicknames.
 15. The method of claim 12, further comprisingcomputing an outgoing interface list at an intermediate parent switch.16. The method of claim 15, wherein the outgoing interface list onlyincludes an ECMP path signaled by the PDU.
 17. The method of claim 16,further comprising modifying one of: an incoming interface check or areverse path forwarding check.
 18. The method of claim 17, wherein themodification of one of: the incoming interface check or the reverse pathforwarding check results in the check being done on a per-multicastgroup/per-source switch basis.
 19. The method of claim 12, furthercomprising receiving multicast packets for a multidestination group,wherein the multicast packets share an identical group address.
 20. Themethod of claim 19, further comprising selecting a parent switch for theidentical group address based on VLAN information associated with thegroup.